Optimization in Julia

Biostat/Biomath M257

Author

Dr. Hua Zhou @ UCLA

Published

May 23, 2024

System information (for reproducibility):

versioninfo()
Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
  JULIA_NUM_THREADS = 8
  JULIA_EDITOR = code

Load packages:

using Pkg

Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()
  Activating project at `~/Documents/github.com/ucla-biostat-257/2024spring/slides/22-juliaopt`
Precompiling project...
  ? OptimizationBase → OptimizationMTKExt
Status `~/Documents/github.com/ucla-biostat-257/2024spring/slides/22-juliaopt/Project.toml`
  [1e616198] COSMO v0.8.9
  [13f3f980] CairoMakie v0.12.2
  [61c947e1] Clarabel v0.8.1
  [f65535da] Convex v0.16.0
  [31c24e10] Distributions v0.25.108
  [5789e2e9] FileIO v1.16.3
  [f6369f11] ForwardDiff v0.10.36
  [2e9cd046] Gurobi v1.3.0
  [b99e6be6] Hypatia v0.8.1
  [916415d5] Images v0.26.1
  [b6b21f68] Ipopt v1.6.2
  [4076af6c] JuMP v1.22.1
  [2ddba703] Juniper v0.9.2
  [b8f27783] MathOptInterface v1.30.0
  [1ec41992] MosekTools v0.15.1
  [7f7a1694] Optimization v3.25.1
  [fd9f6733] OptimizationMOI v0.4.2
  [4e6fcdb7] OptimizationNLopt v0.2.0
  [36348300] OptimizationOptimJL v0.3.1
  [c946c3f1] SCS v2.0.0
  [276daf66] SpecialFunctions v2.4.0
  [37e2e46d] LinearAlgebra
  [9a3f8284] Random
  [2f01184e] SparseArrays v1.10.0
  [10745b16] Statistics v1.10.0

1 Optimization in Julia

This lecture gives an overview of some optimization tools in Julia.

1.1 Flowchart

  • Statisticians do optimizations in daily life: maximum likelihood estimation, machine learning, …

  • Category of optimization problems:

    1. Problems with analytical solutions: least squares, principle component analysis, canonical correlation analysis, …

    2. Problems subject to Disciplined Convex Programming (DCP): linear programming (LP), quadratic programming (QP), second-order cone programming (SOCP), semidefinite programming (SDP), and geometric programming (GP).

    3. Nonlinear programming (NLP): Newton type algorithms, Fisher scoring algorithm, EM algorithm, MM algorithms.

    4. Large scale optimization: ADMM, SGD, …

Flowchart

1.2 Modeling tools and solvers

Getting familiar with good optimization softwares broadens the scope and scale of problems we are able to solve in statistics. Following table lists some of the best optimization softwares.

LP MILP SOCP MISOCP SDP GP NLP MINLP R Matlab Julia Python Cost
modeling tools
cvx x x x x x x x x x A
Convex.jl x x x x x x O
Optimization.jl x x x x x x x O
JuMP.jl x x x x x x x O
MathOptInterface.jl x x x x x x x O
convex solvers
Mosek x x x x x x x x x x x A
Gurobi x x x x x x x x A
CPLEX x x x x x x x x A
SCS x x x x x x O
COSMO.jl x x x x O
Hypatia.jl (more cones) x x x x O
NLP solvers
NLopt x x x x x x O
Ipopt x x x x x x O
KNITRO x x x x x x x x $
O: open source, A: free academic license, $: commercial
  • For more extended list of solvers, see here.

  • Difference between modeling tool and solvers

    • Modeling tools such as cvx (for Matlab) and Convex.jl (Julia analog of cvx) implement the disciplined convex programming (DCP) paradigm proposed by Grant and Boyd (2008) http://stanford.edu/~boyd/papers/disc_cvx_prog.html. DCP prescribes a set of simple rules from which users can construct convex optimization problems easily.

    • Solvers (Mosek, Gurobi, Cplex, SCS, COSMO, Hypatia, …) are concrete software implementation of optimization algorithms. My favorite ones are: Mosek/Gurobi/SCS for DCP and Ipopt/NLopt for nonlinear programming. Mosek and Gurobi are commercial software but free for academic use. SCS/Ipopt/NLopt are open source.

    • Modeling tools usually have the capability to use a variety of solvers. But modeling tools are solver agnostic so users do not have to worry about specific solver interface.

  • If you want to install the commercial solvers Gurobi or Mosek, instructions are below:

    • Gurobi: 1. Download Gurobi at link. 2. Register an account and request free academic license at link. 3. Run grbgetkey XXXXXXXXX command on terminal as suggested. It’ll retrieve a license file and put it under a specified folder, e.g., /Users/huazhou/Documents/Gurobi/gurobi.lic. 4. Set up the environmental variables. On my machine, I put following two lines in the ~/.julia/config/startup.jl file: ENV["GUROBI_HOME"] = "/Library/gurobi1101/macos_universal2" and ENV["GRB_LICENSE_FILE"] = "/Users/huazhou/Documents/Gurobi/gurobi.lic".
    • Mosek: 1. Request free academic license at link. The license file will be sent to your edu email within minutes. Check Spam folder if necessary. 2. Put the license file at the default location ~/mosek/.
    • Install Julia packages Convex.jl, SCS.jl, Gurobi.jl, Mosek.jl, MathProgBase.jl, NLopt.jl, Ipopt.jl, which are open source.

1.3 DCP Using Convex.jl

Standard convex problem classes like LP (linear programming), QP (quadratic programming), SOCP (second-order cone programming), SDP (semidefinite programming), and GP (geometric programming), are becoming a technology.

DCP Hierarchy

1.3.1 Example: microbiome regression analysis

We illustrate optimization tools in Julia using microbiome analysis as an example.

16S microbiome sequencing techonology generates sequence counts of various organisms (OTUs, operational taxonomic units) in samples.

Microbiome Data

For statistical analysis, counts are normalized into proportions for each sample, resulting in a covariate matrix \(\mathbf{X}\) with all rows summing to 1. For identifiability, we need to add a sum-to-zero constraint to the regression cofficients. In other words, we need to solve a constrained least squares problem
\[ \text{minimize} \frac{1}{2} \|\mathbf{y} - \mathbf{X} \beta\|_2^2 \] subject to the constraint \(\sum_{j=1}^p \beta_j = 0\). For simplicity we ignore intercept and non-OTU covariates in this presentation.

Let’s first generate an artifical data set.

using Random, LinearAlgebra, SparseArrays

Random.seed!(257) # seed

n, p = 100, 50
X = rand(n, p)
# scale each row of X sum to 1
lmul!(Diagonal(1 ./ vec(sum(X, dims=2))), X)
# true β is a sparse vector with about 10% non-zero entries
β = sprandn(p, 0.1) 
y = X * β + randn(n);

1.3.2 Sum-to-zero regression

The sum-to-zero contrained least squares is a standard quadratic programming (QP) problem so should be solved easily by any QP solver.

1.3.2.1 Modeling using Convex.jl

We use the Convex.jl package to model this QP problem. For a complete list of operations supported by Convex.jl, see https://jump.dev/Convex.jl/stable/operations/.

using Convex

β̂cls = Variable(size(X, 2))
problem = minimize(0.5sumsquares(y - X * β̂cls)) # objective
problem.constraints = [sum(β̂cls) == 0]; # constraint
problem
Problem statistics
  problem is DCP         : true
  number of variables    : 1 (50 scalar elements)
  number of constraints  : 1 (1 scalar elements)
  number of coefficients : 5_103
  number of atoms        : 7

Solution summary
  termination status : OPTIMIZE_NOT_CALLED
  primal status      : NO_SOLUTION
  dual status        : NO_SOLUTION

Expression graph
  minimize
   └─ * (convex; positive)
      ├─ [0.5;;]
      └─ qol (convex; positive)
         ├─ + (affine; real)
         │  ├─ …
         │  └─ …
         └─ [1;;]
  subject to
   └─ == constraint (affine)
      └─ + (affine; real)
         ├─ sum (affine; real)
         │  └─ …
         └─ [0;;]

1.3.2.2 Mosek

We first use the Mosek solver to solve this QP.

using MosekTools, MathOptInterface
const MOI = MathOptInterface

solver = MOI.OptimizerWithAttributes(Mosek.Optimizer, "LOG" => 1)

@time solve!(problem, solver)
┌ Info: [Convex.jl] Compilation finished: 3.2 seconds, 831.109 MiB of memory allocated
└ @ Convex /Users/huazhou/.julia/packages/Convex/y7lu0/src/solution.jl:107
Problem
  Name                   :                 
  Objective sense        : minimize        
  Type                   : CONIC (conic optimization problem)
  Constraints            : 1               
  Affine conic cons.     : 1 (102 rows)
  Disjunctive cons.      : 0               
  Cones                  : 0               
  Scalar variables       : 51              
  Matrix variables       : 0               
  Integer variables      : 0               

Optimizer started.
Presolve started.
Linear dependency checker started.
Linear dependency checker terminated.
Eliminator started.
Freed constraints in eliminator : 0
Eliminator terminated.
Eliminator - tries                  : 1                 time                   : 0.00            
Lin. dep.  - tries                  : 1                 time                   : 0.00            
Lin. dep.  - primal attempts        : 1                 successes              : 1               
Lin. dep.  - dual attempts          : 0                 successes              : 0               
Lin. dep.  - primal deps.           : 0                 dual deps.             : 0               
Presolve terminated. Time: 0.00    
Optimizer  - threads                : 12              
Optimizer  - solved problem         : the dual        
Optimizer  - Constraints            : 50              
Optimizer  - Cones                  : 2               
Optimizer  - Scalar variables       : 104               conic                  : 104             
Optimizer  - Semi-definite variables: 0                 scalarized             : 0               
Factor     - setup time             : 0.00            
Factor     - dense det. time        : 0.00              GP order time          : 0.00            
Factor     - nonzeros before factor : 1275              after factor           : 1275            
Factor     - dense dim.             : 0                 flops                  : 3.03e+05        
ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME  
0   2.9e+00  2.1e-01  1.7e+00  0.00e+00   3.535533906e-01   -3.535533906e-01  1.0e+00  0.00  
1   1.3e-01  8.8e-03  2.6e-01  -9.42e-01  5.476668348e+00   1.694192885e+01   4.3e-02  0.00  
2   1.9e-02  1.3e-03  3.4e-02  -2.35e-01  1.848385531e+01   2.813003558e+01   6.4e-03  0.00  
3   7.7e-04  5.5e-05  4.1e-04  5.50e-01   2.814808382e+01   2.895785952e+01   2.6e-04  0.00  
4   5.7e-05  4.0e-06  5.9e-06  9.72e-01   2.880931999e+01   2.884037712e+01   1.9e-05  0.00  
5   4.5e-10  4.0e-12  3.1e-15  9.98e-01   2.884672314e+01   2.884672314e+01   5.7e-14  0.00  
Optimizer terminated. Time: 0.00    

  3.938174 seconds (14.71 M allocations: 999.626 MiB, 3.76% gc time, 99.10% compilation time)
Problem statistics
  problem is DCP         : true
  number of variables    : 1 (50 scalar elements)
  number of constraints  : 1 (1 scalar elements)
  number of coefficients : 5_103
  number of atoms        : 7

Solution summary
  termination status : OPTIMAL
  primal status      : FEASIBLE_POINT
  dual status        : FEASIBLE_POINT
  objective value    : 28.8467

Expression graph
  minimize
   └─ * (convex; positive)
      ├─ [0.5;;]
      └─ qol (convex; positive)
         ├─ + (affine; real)
         │  ├─ …
         │  └─ …
         └─ [1;;]
  subject to
   └─ == constraint (affine)
      └─ + (affine; real)
         ├─ sum (affine; real)
         │  └─ …
         └─ [0;;]
# Check the status, optimal value, and minimizer of the problem
problem.status, problem.optval, β̂cls.value
(MathOptInterface.OPTIMAL, 28.846723126126648, [5.50999308911286; 2.3835671892547334; … ; -3.5233708752162136; -14.863428523256783;;])
# check constraint satisfication
sum(β̂cls.value)
2.4158453015843406e-13

1.3.2.3 Gurobi

Switch to Gurobi solver:

using Gurobi

solver = MOI.OptimizerWithAttributes(Gurobi.Optimizer, "OutputFlag" => 1)

@time solve!(problem, solver)
Set parameter Username
Academic license - for non-commercial use only - expires 2025-05-07
Gurobi Optimizer version 11.0.2 build v11.0.2rc0 (mac64[arm] - Darwin 23.5.0 23F79)

CPU model: Apple M2 Max
Thread count: 12 physical cores, 12 logical processors, using up to 12 threads

Optimize a model with 103 rows, 153 columns and 5155 nonzeros
Model fingerprint: 0xceaf534d
Model has 1 quadratic constraint
Coefficient statistics:
  Matrix range     [3e-05, 1e+00]
  QMatrix range    [1e+00, 1e+00]
  Objective range  [5e-01, 5e-01]
  Bounds range     [0e+00, 0e+00]
  RHS range        [3e-02, 3e+00]
Presolve removed 1 rows and 1 columns
Presolve time: 0.00s
Presolved: 102 rows, 152 columns, 5152 nonzeros
Presolved model has 1 second-order cone constraint
Ordering time: 0.00s

Barrier statistics:
 Free vars  : 50
 AA' NZ     : 5.150e+03
 Factor NZ  : 5.253e+03
 Factor Ops : 3.590e+05 (less than 1 second per iteration)
 Threads    : 1

                  Objective                Residual
Iter       Primal          Dual         Primal    Dual     Compl     Time
   0   7.13616591e+00  0.00000000e+00  1.43e+01 1.39e-02  7.28e-02     0s
   1   3.01025068e+00  6.64661859e-01  5.49e+00 2.91e-05  2.46e-02     0s
   2   4.03572275e+00  3.01841471e+00  3.28e+00 3.53e-06  2.31e-02     0s
   3   1.37424301e+01  9.41379966e+00  6.97e-01 9.61e-07  5.61e-02     0s
   4   2.24325118e+01  2.47075686e+01  2.01e-01 3.44e-06  2.72e-02     0s
   5   3.23085168e+01  1.89965179e+01  2.21e-07 2.88e-05  1.31e-01     0s
   6   2.97349069e+01  2.87096350e+01  7.04e-10 4.39e-06  1.00e-02     0s
   7   2.88512338e+01  2.88129938e+01  3.80e-11 2.26e-07  3.75e-04     0s
   8   2.88475708e+01  2.88462464e+01  2.01e-11 2.71e-08  1.30e-05     0s
   9   2.88467247e+01  2.88467190e+01  3.89e-12 4.99e-09  6.30e-08     0s

Barrier solved model in 9 iterations and 0.00 seconds (0.01 work units)
Optimal objective 2.88467247e+01


User-callback calls 74, time in user-callback 0.00 sec
  2.202765 seconds (6.19 M allocations: 412.770 MiB, 2.61% gc time, 98.31% compilation time)
┌ Info: [Convex.jl] Compilation finished: 2.04 seconds, 379.766 MiB of memory allocated
└ @ Convex /Users/huazhou/.julia/packages/Convex/y7lu0/src/solution.jl:107
Problem statistics
  problem is DCP         : true
  number of variables    : 1 (50 scalar elements)
  number of constraints  : 1 (1 scalar elements)
  number of coefficients : 5_103
  number of atoms        : 7

Solution summary
  termination status : OPTIMAL
  primal status      : FEASIBLE_POINT
  dual status        : NO_SOLUTION
  objective value    : 28.8467

Expression graph
  minimize
   └─ * (convex; positive)
      ├─ [0.5;;]
      └─ qol (convex; positive)
         ├─ + (affine; real)
         │  ├─ …
         │  └─ …
         └─ [1;;]
  subject to
   └─ == constraint (affine)
      └─ + (affine; real)
         ├─ sum (affine; real)
         │  └─ …
         └─ [0;;]
# Check the status, optimal value, and minimizer of the problem
problem.status, problem.optval, β̂cls.value
(MathOptInterface.OPTIMAL, 28.8467247421248, [5.51045752823481; 2.383773303349426; … ; -3.5236693367952014; -14.86470150370526;;])
# check constraint satisfication
sum(β̂cls.value)
-8.881784197001252e-13

1.3.2.4 COSMO

Switch to COSMO solver (pure Julia implementation):

# Use COSMO solver
using COSMO

solver = MOI.OptimizerWithAttributes(COSMO.Optimizer, "max_iter" => 50_000, "verbose" => false)

@time solve!(problem, solver)
┌ Info: [Convex.jl] Compilation finished: 2.75 seconds, 463.817 MiB of memory allocated
└ @ Convex /Users/huazhou/.julia/packages/Convex/y7lu0/src/solution.jl:107
  7.953742 seconds (36.63 M allocations: 2.456 GiB, 5.06% gc time, 97.22% compilation time)
Problem statistics
  problem is DCP         : true
  number of variables    : 1 (50 scalar elements)
  number of constraints  : 1 (1 scalar elements)
  number of coefficients : 5_103
  number of atoms        : 7

Solution summary
  termination status : OPTIMAL
  primal status      : FEASIBLE_POINT
  dual status        : FEASIBLE_POINT
  objective value    : 28.8467

Expression graph
  minimize
   └─ * (convex; positive)
      ├─ [0.5;;]
      └─ qol (convex; positive)
         ├─ + (affine; real)
         │  ├─ …
         │  └─ …
         └─ [1;;]
  subject to
   └─ == constraint (affine)
      └─ + (affine; real)
         ├─ sum (affine; real)
         │  └─ …
         └─ [0;;]
# Check the status, optimal value, and minimizer of the problem
problem.status, problem.optval, β̂cls.value
(MathOptInterface.OPTIMAL, 28.84672523879072, [5.509993091037829; 2.383567194187562; … ; -3.523370878385065; -14.863428523250539;;])

We see COSMO have a looser criterion for constraint satisfication, resulting a lower objective value.

sum(β̂cls.value)
-6.050484557817981e-10

1.3.2.5 SCS

Switch to the open source SCS solver:

# Use SCS solver
using SCS

solver = MOI.OptimizerWithAttributes(SCS.Optimizer, "verbose" => 5000)

@time solve!(problem, solver)
┌ Info: [Convex.jl] Compilation finished: 2.44 seconds, 285.092 MiB of memory allocated
└ @ Convex /Users/huazhou/.julia/packages/Convex/y7lu0/src/solution.jl:107
------------------------------------------------------------------
           SCS v3.2.4 - Splitting Conic Solver
    (c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 51, constraints m: 103
cones:    z: primal zero / dual free vars: 1
      q: soc vars: 102, qsize: 1
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
      alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
      max_iters: 100000, normalize: 1, rho_x: 1.00e-06
      acceleration_lookback: 10, acceleration_interval: 10
      compiled with openmp parallelization enabled
lin-sys:  sparse-direct-amd-qdldl
      nnz(A): 5052, nnz(P): 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 8.17e+00  3.65e+00  3.88e+01  1.86e+01  1.00e-01  1.21e-03 
   250| 1.53e-03  1.61e-04  8.02e-02  2.88e+01  4.93e-03  4.85e-03 
   325| 8.77e-08  2.27e-08  2.20e-09  2.88e+01  4.93e-03  5.91e-03 
------------------------------------------------------------------
status:  solved
timings: total: 5.91e-03s = setup: 5.78e-04s + solve: 5.33e-03s
     lin-sys: 3.45e-03s, cones: 9.30e-05s, accel: 6.84e-05s
------------------------------------------------------------------
objective = 28.846724
------------------------------------------------------------------
  3.628240 seconds (7.90 M allocations: 526.526 MiB, 2.05% gc time, 99.32% compilation time)
Problem statistics
  problem is DCP         : true
  number of variables    : 1 (50 scalar elements)
  number of constraints  : 1 (1 scalar elements)
  number of coefficients : 5_103
  number of atoms        : 7

Solution summary
  termination status : OPTIMAL
  primal status      : FEASIBLE_POINT
  dual status        : FEASIBLE_POINT
  objective value    : 28.8467

Expression graph
  minimize
   └─ * (convex; positive)
      ├─ [0.5;;]
      └─ qol (convex; positive)
         ├─ + (affine; real)
         │  ├─ …
         │  └─ …
         └─ [1;;]
  subject to
   └─ == constraint (affine)
      └─ + (affine; real)
         ├─ sum (affine; real)
         │  └─ …
         └─ [0;;]
# Check the status, optimal value, and minimizer of the problem
problem.status, problem.optval, β̂cls.value
(MathOptInterface.OPTIMAL, 28.84672444452367, [5.50999308905218; 2.383567188535443; … ; -3.523370874664315; -14.86342852324446;;])
# check constraint satisfication
sum(β̂cls.value)
1.3303136370268476e-9

1.3.2.6 Hypatia

Switch to the open source Hypatia (pure Julia) solver:

# Use Hypatia solver
using Hypatia

solver = MOI.OptimizerWithAttributes(Hypatia.Optimizer, "verbose" => 1)

@time solve!(problem, solver)
┌ Info: [Convex.jl] Compilation finished: 0.68 seconds, 133.046 MiB of memory allocated
└ @ Convex /Users/huazhou/.julia/packages/Convex/y7lu0/src/solution.jl:107

 iter        p_obj        d_obj |  abs_gap    x_feas    z_feas |      tau       kap        mu | dir_res     prox  step     alpha
    0   8.5628e-01  -2.9196e-01 | 2.00e+00  5.59e-02  5.59e-01 | 1.00e+00  1.00e+00  1.00e+00 |
    1   4.3997e+00   7.8189e+00 | 1.45e-01  3.04e-02  3.04e-01 | 1.84e-01  8.43e-01  1.00e-01 | 8.9e-16  5.5e-01  co-a  9.00e-01
    2   1.4279e+01   2.2121e+01 | 1.07e-02  1.19e-02  1.19e-01 | 4.70e-02  3.90e-01  9.67e-03 | 3.6e-16  8.9e-01  co-a  9.00e-01
    3   2.3867e+01   2.7924e+01 | 1.80e-03  3.59e-03  3.59e-02 | 2.34e-02  9.81e-02  1.36e-03 | 4.4e-15  6.8e-01  co-a  8.50e-01
    4   2.8645e+01   2.8834e+01 | 4.44e-05  1.31e-04  1.31e-03 | 1.92e-02  3.73e-03  3.86e-05 | 1.1e-15  8.5e-01  co-a  9.70e-01
    5   2.8845e+01   2.8846e+01 | 8.85e-07  1.29e-06  1.29e-05 | 1.96e-02  1.53e-05  3.95e-07 | 2.9e-15  3.3e-01  co-a  9.90e-01
    6   2.8847e+01   2.8847e+01 | 8.17e-09  1.42e-08  1.42e-07 | 1.77e-02  1.44e-07  3.57e-09 | 1.2e-12  6.5e-01  co-a  9.90e-01
    7   2.8847e+01   2.8847e+01 | 6.01e-12  1.14e-11  1.70e-10 | 1.48e-02  1.94e-10  2.96e-12 | 5.4e-11  2.9e-02  co-a  9.99e-01
optimal solution found; terminating

status is Optimal after 7 iterations and 3.3 seconds

  6.321179 seconds (24.46 M allocations: 1.641 GiB, 3.03% gc time, 99.51% compilation time: <1% of which was recompilation)
Problem statistics
  problem is DCP         : true
  number of variables    : 1 (50 scalar elements)
  number of constraints  : 1 (1 scalar elements)
  number of coefficients : 5_103
  number of atoms        : 7

Solution summary
  termination status : OPTIMAL
  primal status      : FEASIBLE_POINT
  dual status        : FEASIBLE_POINT
  objective value    : 28.8467

Expression graph
  minimize
   └─ * (convex; positive)
      ├─ [0.5;;]
      └─ qol (convex; positive)
         ├─ + (affine; real)
         │  ├─ …
         │  └─ …
         └─ [1;;]
  subject to
   └─ == constraint (affine)
      └─ + (affine; real)
         ├─ sum (affine; real)
         │  └─ …
         └─ [0;;]
# Check the status, optimal value, and minimizer of the problem
problem.status, problem.optval, β̂cls.value
(MathOptInterface.OPTIMAL, 28.846723110018917, [5.50999308964372; 2.383567188622413; … ; -3.523370874978272; -14.863428523570764;;])
# check constraint satisfication
sum(β̂cls.value)
2.842170943040401e-14

1.3.2.7 Clarabel

Switch to the open source Clarabel (pure Julia) solver:

# Use Hypatia solver
using Clarabel

solver = MOI.OptimizerWithAttributes(Clarabel.Optimizer)

@time solve!(problem, solver)
┌ Info: [Convex.jl] Compilation finished: 0.46 seconds, 67.236 MiB of memory allocated
└ @ Convex /Users/huazhou/.julia/packages/Convex/y7lu0/src/solution.jl:107
-------------------------------------------------------------
           Clarabel.jl v0.8.1  -  Clever Acronym              
                   (c) Paul Goulart                          
                University of Oxford, 2022                   
-------------------------------------------------------------

problem:
  variables     = 51
  constraints   = 103
  nnz(P)        = 0
  nnz(A)        = 5052
  cones (total) = 2
    : Zero        = 1,  numel = 1
    : SecondOrder = 1,  numel = 102

settings:
  linear algebra: direct / qdldl, precision: Float64
  max iter = 200, time limit = Inf,  max step = 0.990
  tol_feas = 1.0e-08, tol_gap_abs = 1.0e-08, tol_gap_rel = 1.0e-08,
  static reg : on, ϵ1 = 1.0e-08, ϵ2 = 4.9e-32
  dynamic reg: on, ϵ = 1.0e-13, δ = 2.0e-07
  iter refine: on, reltol = 1.0e-13, abstol = 1.0e-12, 
               max iter = 10, stop ratio = 5.0
  equilibrate: on, min_scale = 1.0e-04, max_scale = 1.0e+04
               max iter = 10

iter    pcost        dcost       gap       pres      dres      k/t        μ       step      
---------------------------------------------------------------------------------------------
  0   6.8149e-18  -8.7522e+00  8.75e+00  8.87e-02  2.53e-01  1.00e+00  9.64e+01   ------   
  1   4.0721e+00   1.5813e+02  3.78e+01  1.40e-01  4.46e-01  1.72e+02  1.15e+01  9.90e-01  
  2   2.9150e+01   3.5232e+02  1.11e+01  2.26e-02  9.35e-02  3.28e+02  8.50e-01  9.26e-01  
  3   4.7036e+01   4.7082e+01  9.89e-04  4.74e-03  2.28e-02  1.27e+00  6.15e-01  4.12e-01  
  4   3.6893e+01   4.1455e+01  1.24e-01  6.33e-04  2.86e-03  4.70e+00  1.38e-01  8.15e-01  
  5   2.8671e+01   2.9210e+01  1.88e-02  4.63e-05  1.95e-04  5.47e-01  1.19e-02  9.28e-01  
  6   2.8857e+01   2.8874e+01  5.95e-04  2.25e-06  9.46e-06  1.76e-02  6.13e-04  9.90e-01  
  7   2.8847e+01   2.8847e+01  7.55e-06  2.78e-08  1.17e-07  2.23e-04  7.56e-06  9.88e-01  
  8   2.8847e+01   2.8847e+01  1.79e-07  6.53e-10  2.75e-09  5.28e-06  1.78e-07  9.76e-01  
  9   2.8847e+01   2.8847e+01  6.97e-09  1.99e-11  1.06e-10  2.06e-07  6.86e-09  9.62e-01  
---------------------------------------------------------------------------------------------
Terminated with status = solved
solve time = 69.4ms
  1.928717 seconds (5.33 M allocations: 361.237 MiB, 2.67% gc time, 99.40% compilation time: 40% of which was recompilation)
Problem statistics
  problem is DCP         : true
  number of variables    : 1 (50 scalar elements)
  number of constraints  : 1 (1 scalar elements)
  number of coefficients : 5_103
  number of atoms        : 7

Solution summary
  termination status : OPTIMAL
  primal status      : FEASIBLE_POINT
  dual status        : FEASIBLE_POINT
  objective value    : 28.8467

Expression graph
  minimize
   └─ * (convex; positive)
      ├─ [0.5;;]
      └─ qol (convex; positive)
         ├─ + (affine; real)
         │  ├─ …
         │  └─ …
         └─ [1;;]
  subject to
   └─ == constraint (affine)
      └─ + (affine; real)
         ├─ sum (affine; real)
         │  └─ …
         └─ [0;;]
# Check the status, optimal value, and minimizer of the problem
problem.status, problem.optval, β̂cls.value
(MathOptInterface.OPTIMAL, 28.846723012740114, [5.509993089173388; 2.383567189260773; … ; -3.5233708752276427; -14.863428523231914;;])
# check constraint satisfication
sum(β̂cls.value)
7.105427357601002e-15

1.3.3 Sum-to-zero lasso

Suppose we want to know which organisms (OTU) are associated with the response. We can answer this question using a sum-to-zero contrained lasso \[ \text{minimize} \frac 12 \|\mathbf{y} - \mathbf{X} \beta\|_2^2 + \lambda \|\beta\|_1 \] subject to the constraint \(\sum_{j=1}^p \beta_j = 0\). Varying \(\lambda\) from small to large values will generate a solution path.

using Convex

# # Use Mosek solver
# using Mosek
# solver = Mosek.Optimizer

# # Use Gurobi solver
# using Gurobi
# solver = Gurobi.Optimizer

# # Use SCS solver
# using SCS
# solver = SCS.Optimizer

# Use Hypatia solver
using Hypatia
solver = Hypatia.Optimizer

# # Use Clarabel solver
# using Clarabel
# solver = Clarabel.Optimizer

# solve at a grid of λ
λgrid = 0:0.01:0.35

# holder for solution path
β̂path = zeros(length(λgrid), size(X, 2)) # each row is β̂ at a λ
# optimization variable
β̂classo = Variable(size(X, 2))
# obtain solution path using warm start
@time for i in 1:length(λgrid)
    λ = λgrid[i]
    # define optimization problem
    # objective
    problem = minimize(0.5sumsquares(y - X * β̂classo) + λ * sum(abs, β̂classo))
    # constraint
    problem.constraints += sum(β̂classo) == 0 # constraint
    solve!(problem, solver; silent = true)
    β̂path[i, :] = β̂classo.value
end
┌ Warning: Concatenating collections of constraints together with `+` or `+=` to produce a new list of constraints is deprecated. Instead, use `vcat` to concatenate collections of constraints.
└ @ Convex /Users/huazhou/.julia/packages/Convex/y7lu0/src/deprecations.jl:129
  1.819383 seconds (4.05 M allocations: 381.876 MiB, 2.31% gc time, 70.21% compilation time: 14% of which was recompilation)
using CairoMakie

f = Figure()
Axis(
    f[1, 1], 
    title = "Sum-to-Zero Lasso",
    xlabel = L"\lambda",
    ylabel = L"\beta_j"
)
series!(λgrid, β̂path', color = :glasbey_category10_n256)
f

1.3.4 Sum-to-zero group lasso

Suppose we want to do variable selection not at the OTU level, but at the Phylum level. OTUs are clustered into various Phyla. We can answer this question using a sum-to-zero contrained group lasso \[ \text{minimize} \frac 12 \|\mathbf{y} - \mathbf{X} \beta\|_2^2 + \lambda \sum_j \|\mathbf{\beta}_j\|_2 \] subject to the constraint \(\sum_{j=1}^p \beta_j = 0\), where \(\mathbf{\beta}_j\) are regression coefficients corresponding to the \(j\)-th phylum. This is a second-order cone programming (SOCP) problem readily modeled by Convex.jl.

Let’s assume each 10 contiguous OTUs belong to one Phylum.

# # Use Mosek solver
# using Mosek
# solver = Mosek.Optimizer

# # Use Gurobi solver
# using Gurobi
# solver = Gurobi.Optimizer

# # Use SCS solver
# using SCS
# solver = SCS.Optimizer

# Use Hypatia solver
using Hypatia
solver = Hypatia.Optimizer

# # Use Clarabel solver
# using Clarabel
# solver = Clarabel.Optimizer

# solve at a grid of λ
λgrid = 0.0:0.005:0.5
β̂pathgrp = zeros(length(λgrid), size(X, 2)) # each row is β̂ at a λ
β̂classo = Variable(size(X, 2))
@time for i in 1:length(λgrid)
    λ = λgrid[i]
    # loss
    obj = 0.5sumsquares(y - X * β̂classo)
    # group lasso penalty term
    for j in 1:(size(X, 2) ÷ 10)
        βj = β̂classo[(10(j - 1) + 1) : 10j]
        obj = obj + λ * norm(βj)
    end
    problem = minimize(obj)
    # constraint
    problem.constraints = [sum(β̂classo) == 0] # constraint
    solve!(problem, solver; silent = true)
    β̂pathgrp[i, :] = β̂classo.value
end
  1.972572 seconds (5.89 M allocations: 588.133 MiB, 3.00% gc time, 62.61% compilation time: 51% of which was recompilation)

We see it took Mosek <2 second to solve this seemingly hard optimization problem at 101 different \(\lambda\) values.

using CairoMakie

f = Figure()
Axis(
    f[1, 1], 
    title = "Sum-to-Zero Group Lasso",
    xlabel = L"\lambda",
    ylabel = L"\beta_j"
)
series!(λgrid, β̂pathgrp', color = :glasbey_category10_n256)
f

1.3.5 Example: matrix completion

Load the \(128 \times 128\) Lena picture with missing pixels.

using FileIO

lena = load("lena128missing.png")

# convert to real matrices
Y = Float64.(lena)
128×128 Matrix{Float64}:
 0.0       0.0       0.635294  0.0       …  0.0       0.0       0.627451
 0.627451  0.623529  0.0       0.611765     0.0       0.0       0.388235
 0.611765  0.611765  0.0       0.0          0.403922  0.219608  0.0
 0.0       0.0       0.611765  0.0          0.223529  0.176471  0.192157
 0.611765  0.0       0.615686  0.615686     0.0       0.0       0.0
 0.0       0.0       0.0       0.619608  …  0.0       0.0       0.2
 0.607843  0.0       0.623529  0.0          0.176471  0.192157  0.0
 0.0       0.0       0.623529  0.0          0.0       0.0       0.215686
 0.619608  0.619608  0.0       0.0          0.2       0.0       0.207843
 0.0       0.0       0.635294  0.635294     0.2       0.192157  0.188235
 ⋮                                       ⋱  ⋮                   
 0.345098  0.341176  0.356863  0.513725     0.0       0.0       0.231373
 0.0       0.0       0.0       0.0       …  0.0       0.243137  0.258824
 0.298039  0.415686  0.458824  0.0          0.0       0.0       0.258824
 0.0       0.368627  0.4       0.0          0.0       0.0       0.235294
 0.0       0.0       0.34902   0.0          0.0       0.239216  0.207843
 0.219608  0.0       0.0       0.0          0.0       0.0       0.2
 0.0       0.219608  0.235294  0.356863  …  0.0       0.0       0.0
 0.196078  0.207843  0.211765  0.0          0.0       0.270588  0.345098
 0.192157  0.0       0.196078  0.309804     0.266667  0.356863  0.0

We fill out the missin pixels uisng a matrix completion technique developed by Candes and Tao \[ \text{minimize } \|\mathbf{X}\|_* \] \[ \text{subject to } x_{ij} = y_{ij} \text{ for all observed entries } (i, j). \] Here \(\|\mathbf{M}\|_* = \sum_i \sigma_i(\mathbf{M})\) is the nuclear norm. In words we seek the matrix with minimal nuclear norm that agrees with the observed entries. This is a semidefinite programming (SDP) problem readily modeled by Convex.jl.

This example takes longer because of high dimensionality. COSMO.jl seems to be the fastest solver for this problem. Other solvers take excessively long time.

# Use COSMO solver (fast)
using COSMO
solver = COSMO.Optimizer

# # Use Hypatia solver (slow)
# using Hypatia
# solver = Hypatia.Optimizer()

# # Use Clarabel solver (slow)
# using Clarabel
# solver = Clarabel.Optimizer()

# Linear indices of obs. entries
obsidx = findall(Y[:] .≠ 0.0)
# Create optimization variables
X = Variable(size(Y))
# Set up optmization problem
problem = minimize(nuclearnorm(X))
problem.constraints += X[obsidx] == Y[obsidx]
# Solve the problem by calling solve
@time solve!(problem, solver) # fast
┌ Info: [Convex.jl] Compilation finished: 0.69 seconds, 314.304 MiB of memory allocated
└ @ Convex /Users/huazhou/.julia/packages/Convex/y7lu0/src/solution.jl:107
------------------------------------------------------------------
          COSMO v0.8.9 - A Quadratic Objective Conic Solver
                         Michael Garstka
                University of Oxford, 2017 - 2022
------------------------------------------------------------------

Problem:  x ∈ R^{32897},
          constraints: A ∈ R^{41025x32897} (41281 nnz),
          matrix size to factor: 73922x73922,
          Floating-point precision: Float64
Sets:     DensePsdConeTriangle of dim: 32896 (256x256)
          ZeroSet of dim: 8128
          Nonnegatives of dim: 1
Settings: ϵ_abs = 1.0e-05, ϵ_rel = 1.0e-05,
          ϵ_prim_inf = 1.0e-04, ϵ_dual_inf = 1.0e-04,
          ρ = 0.1, σ = 1e-06, α = 1.6,
          max_iter = 5000,
          scaling iter = 10 (on),
          check termination every 25 iter,
          check infeasibility every 40 iter,
          KKT system solver: QDLDL
Acc:      Anderson Type2{QRDecomp},
          Memory size = 15, RestartedMemory,    
          Safeguarded: true, tol: 2.0
Setup Time: 27.64ms

Iter:   Objective:  Primal Res: Dual Res:   Rho:
1   -1.4585e+03 1.5985e+01  5.9854e-01  1.0000e-01
25   1.4525e+02 5.1105e-02  1.1356e-03  1.0000e-01
50   1.4758e+02 1.1725e-02  1.4834e-03  6.8658e-01
75   1.4797e+02 5.5160e-04  4.7489e-05  6.8658e-01
100  1.4797e+02 1.7304e-05  1.3870e-06  6.8658e-01

------------------------------------------------------------------
>>> Results
Status: Solved
Iterations: 100
Optimal objective: 148
Runtime: 1.147s (1147.0ms)

  2.172648 seconds (3.98 M allocations: 717.390 MiB, 4.64% gc time, 62.44% compilation time: 15% of which was recompilation)
Problem statistics
  problem is DCP         : true
  number of variables    : 1 (16_384 scalar elements)
  number of constraints  : 1 (8_128 scalar elements)
  number of coefficients : 8_128
  number of atoms        : 3

Solution summary
  termination status : OPTIMAL
  primal status      : FEASIBLE_POINT
  dual status        : FEASIBLE_POINT
  objective value    : 147.9711

Expression graph
  minimize
   └─ nuclearnorm (convex; positive)
      └─ 128×128 real variable (id: 104…399)
  subject to
   └─ == constraint (affine)
      └─ + (affine; real)
         ├─ index (affine; real)
         │  └─ …
         └─ 8128×1 Matrix{Float64}
using Images

#Gray.(X.value)
colorview(Gray, X.value)

1.4 Nonlinear programming (NLP)

We use MLE of Gamma distribution to illustrate some rudiments of nonlinear programming (NLP) in Julia.

Let \(x_1,\ldots,x_m\) be a random sample from the gamma density \[ f(x) = \Gamma(\alpha)^{-1} \beta^{\alpha} x^{\alpha-1} e^{-\beta x} \] on \((0,\infty)\). The loglikelihood function is \[ L(\alpha, \beta) = m [- \ln \Gamma(\alpha) + \alpha \ln \beta + (\alpha - 1)\overline{\ln x} - \beta \bar x], \] where \(\overline{x} = \frac{1}{m} \sum_{i=1}^m x_i\) and \(\overline{\ln x} = \frac{1}{m} \sum_{i=1}^m \ln x_i\).

1.4.1 Define NLP optimization problem using Optimization.jl

using Distributions, Optimization, Random, Statistics, SpecialFunctions

Random.seed!(257)

# θ = (α, β)
function gamma_logpdf(x::Vector{<:Real}, θ::Vector{<:Real})
    m = length(x)
    avg = mean(x)
    α, β = θ[1], θ[2]
    logavg = sum(log, x) / m
    m * (- log(gamma(α)) + α * log(β) +- 1) * logavg - β * avg)
end

# test data
x = rand(5)
gamma_logpdf(x, [1.0, 1.0])
-3.3064167958026847

Generate data.

Random.seed!(257)

(n, p) = (1000, 2)
(α, β) = 5rand(p)
x = rand(Gamma(α, β), n)
println("True parameter values:")
println("α = ", α, ", β = ", β)
True parameter values:
α = 2.838860122889505, β = 4.699007334188252

We use Optimization.jl to define and solve our NLP problem.

using ForwardDiff, Optimization

# loss function and gradient (by auto-diff)
loss = (θ, p) -> -gamma_logpdf(x, θ)
# more auto-diff choices: https://docs.sciml.ai/Optimization/stable/API/ad/
optf = OptimizationFunction(loss, Optimization.AutoForwardDiff())
# start point
θ₀ = [1.0, 1.0]
# optimization problem
prob = OptimizationProblem(optf, θ₀, lb = [0.0, 0.0], ub = [Inf, Inf])
OptimizationProblem. In-place: true
u0: 2-element Vector{Float64}:
 1.0
 1.0

1.4.2 Optim.jl solver

List of algorithms in Optim.jl.

using OptimizationOptimJL

# BFGS algorithm (gradient based)
sol = solve(prob, BFGS())
# Nelder-Mead algorithm (gradient free)
sol = solve(prob, NelderMead())
# Conjugate Gradient algorithm (gradient based)
sol = solve(prob, ConjugateGradient())

1.4.3 NLopt.jl solver

List of algorithms in OptimizationNLopt.jl.

using OptimizationNLopt

sol = solve(prob, NLopt.LD_LBFGS())

1.4.4 MathOptInterface.jl solver

List of algorithms in OptimizationMOI.jl.

using OptimizationMOI, Ipopt

# Ipopt
sol = solve(prob, Ipopt.Optimizer())